Annotating Legitimate Disagreement in Corpus Construction

نویسندگان

  • Billy T. M. Wong
  • Sophia Y. M. Lee
چکیده

This paper addresses the resolution of inter-annotator disagreement in corpus construction. Given the consistency requirement which is regarded as a critical criterion of annotation quality, interannotator disagreement is usually considered harmful to the accuracy and reliability of annotation, and thus has to be resolved through various means. We claim that strictly adhering to consistency would also neglect the legitimate disagreement originating from ambiguity in natural languages. We highlight the values of preserving legitimate disagreement in annotation, and show that the possible problems resulting from inconsistency are avoidable. A preliminary annotation scheme is suggested for supporting multiple versions of annotation, without giving up the virtue

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotating Article Errors in Spanish Learner Texts: Design and Evaluation of an Annotation Scheme

Annotating a corpus with error information is a challenging task. This paper describes the design, evaluation and refinement of an annotation scheme for Spanish article errors in learner data, so that future work on corpus annotation and automatic article error detection can progress. To evaluate reliability, 300 noun phrases with definite, indefinite and zero article have been tagged by four a...

متن کامل

Annotating Agreement and Disagreement in Threaded Discussion

We introduce a new corpus of sentence-level agreement and disagreement annotations over LiveJournal and Wikipedia threads. This is the first agreement corpus to offer full-document annotations for threaded discussions. We provide a methodology for coding responses as well as an implemented tool with an interface that facilitates annotation of a specific response while viewing the full context o...

متن کامل

Annotating Orthographic Target Hypotheses in a German L1 Learner Corpus

NLP applications for learners often rely on annotated learner corpora. Thereby, it is important that the annotations are both meaningful for the task, and consistent and reliable. We present a new longitudinal L1 learner corpus for German (handwritten texts collected in grade 2–4), which is transcribed and annotated with a target hypothesis that strictly only corrects orthographic errors, and i...

متن کامل

The Construction of a Chinese Named Entity Tagged Corpus: CNEC1.0

In order to build an automatic named entity recognition (NER) system for machine learning, a large tagged corpus is necessary. This paper describes the manual construction of a Chinese named entity tagged corpus (CNEC 1.0) that can be used to improve NER performance. In this project, we define five named entity tags: PER (person name), LOC (location name), ORG (organization name), LAO (location...

متن کامل

Constructing Evaluation Corpora for Automated Clinical Named Entity Recognition

We report on the construction of a gold-standard dataset consisting of annotated clinical notes suitable for evaluating our biomedical named entity recognition system. The dataset is the result of consensus between four human annotators and contains 1,556 annotations on 160 clinical notes using 658 unique concept codes from SNOMED-CT corresponding to human disorders. Inter-annotator agreement w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013